Skip to content

Support tool callbacks in MCP sampling#2998

Open
EronWright wants to merge 4 commits into
docker:mainfrom
EronWright:sampling-tools
Open

Support tool callbacks in MCP sampling#2998
EronWright wants to merge 4 commits into
docker:mainfrom
EronWright:sampling-tools

Conversation

@EronWright

@EronWright EronWright commented Jun 4, 2026

Copy link
Copy Markdown

Summary

Closes the tool callbacks functional gap in MCP sampling support — a follow-up to #2815, addressing one of the remaining items from #2809.

When an MCP server includes a tools array in a sampling/createMessage request, the host now drives its model with those tools and returns any tool_use blocks back to the server as ToolUseContent. The server remains responsible for executing the tool and continuing the loop in a follow-up sampling request.

sequenceDiagram
     participant H as cagent
      participant S as MCP Server
      participant L as LLM

      activate H
      H->>+S: tools/call {name, arguments}

      note over S: needs LLM inference

      S->>+H: sampling/createMessage<br/>{messages, tools: [...]}
      H->>+L: chat completion
      L-->>-H: ToolUseContent<br/>stopReason: "toolUse"
      H-->>-S: CreateMessageResult<br/>{tool_use, stopReason: "toolUse"}

      note over S: executes tool locally

      S->>+H: sampling/createMessage<br/>{messages + tool_use + tool_result, tools: [...]}
      H->>+L: chat completion
      L-->>-H: TextContent<br/>stopReason: "endTurn"
      H-->>-S: CreateMessageResult<br/>{text, stopReason: "endTurn"}

      S-->>-H: tool result
      deactivate H
Loading

What's new

  • New SamplingWithToolsHandler type and SampleableWithTools interface — additive, parallel to the existing SamplingHandler / Sampleable. No breaking changes to the basic sampling path merged in feat(mcp): add sampling/createMessage support #2815.
  • MCP toolset wires both handler types. At Initialize, exactly one of the SDK's mutually exclusive ClientOptions.CreateMessage* fields is populated — prefer with-tools when registered, fall back to basic.
  • Capability handshake advertises sampling.tools so servers know the host can receive tool-enabled requests.
  • Runtime handler (pkg/runtime/sampling.go):
    • Converts V2 multi-block messages: text, image/audio, tool_use → assistant ToolCalls, tool_resultMessageRoleTool rows (parallel tool_results expand to multiple chat.Message rows).
    • Converts []*mcp.Tool[]tools.Tool with a no-op handler (the server, not the host, executes).
    • Drives model.CreateChatCompletionStream, aggregates streamed tool calls.
    • Builds result Content with TextContent + ToolUseContent blocks; stopReason: "toolUse" when tool calls are present.
  • New limits: maxSamplingTools=64, maxSamplingToolCalls=32.
  • End-to-end test (e2e/sampling_test.go): mounts an in-process gomcp.NewServer on an httptest server via StreamableHTTPHandler. The server exposes one tool (ask_with_calculator) whose handler drives a real sampling-with-tools loop against the connecting cagent. The Gemini side is recorded once and replayed on subsequent runs, so the test runs offline in CI.

Out of scope (separate gaps from #2809)

  • Human-in-the-loop approval UI
  • Model-preference hints

Test plan

  • Unit tests pass: `go test ./pkg/runtime/... ./pkg/tools/...`
  • `go build ./...` clean
  • `go vet ./...` clean
  • `task lint` (0 offenses)
  • `gofmt` clean on all changed files
  • End-to-end via `TestExec_Gemini_SamplingWithTools` (cassette replays offline; verified once against live Gemini):
    • Handshake includes `sampling.tools` capability (implicit — `ServerSession.CreateMessageWithTools` would refuse otherwise)
    • LLM receives the server-supplied tools
    • Response contains `ToolUseContent` with `stopReason: "toolUse"` when the model emits a tool_use
    • Follow-up sampling request with `tool_result` blocks is converted correctly and the loop terminates with `endTurn`

@aheritier aheritier added area/agent For work that has to do with the general agent loop/agentic features of the app area/tools For features/issues/fixes related to the usage of built-in and MCP tools area/mcp MCP protocol, MCP tool servers, integration kind/feat PR adds a new feature (maps to feat: commit prefix) labels Jun 4, 2026
@EronWright EronWright marked this pull request as ready for review June 7, 2026 00:56
@EronWright EronWright requested a review from a team as a code owner June 7, 2026 00:56
@aheritier aheritier added the area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) label Jun 7, 2026
@aheritier

Copy link
Copy Markdown
Contributor

/review

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Two medium-severity findings in the newly-added sampling-with-tools code. The core stream-aggregation logic, capability handshake, content building, and limits enforcement are all well-structured and correctly tested.

Comment thread pkg/runtime/sampling.go Outdated
Comment thread pkg/tools/mcp/remote.go
@aheritier aheritier removed the area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) label Jun 8, 2026
@EronWright EronWright marked this pull request as draft June 8, 2026 15:03
@EronWright EronWright requested a review from docker-agent June 8, 2026 20:35
@EronWright EronWright marked this pull request as ready for review June 8, 2026 20:38
@aheritier aheritier added area/testing Test infrastructure, CI/CD, test runners, evaluation area/providers/gemini Google Gemini provider support labels Jun 8, 2026
@dgageot

dgageot commented Jun 10, 2026

Copy link
Copy Markdown
Member

Sorry @EronWright, our project requires all the commit to be signed and verified. Could you sign them? Thanks!

@aheritier aheritier added area/testing Test infrastructure, CI/CD, test runners, evaluation status/blocked Author can't proceed and needs external help to get unblocked and removed area/testing Test infrastructure, CI/CD, test runners, evaluation area/providers/gemini Google Gemini provider support labels Jun 12, 2026
@aheritier aheritier marked this pull request as draft June 12, 2026 14:38
@aheritier

Copy link
Copy Markdown
Contributor

I moved it back to draft @EronWright the time you rewrite the history with signed commits. Sorry for the issue

@EronWright EronWright marked this pull request as ready for review June 15, 2026 15:06
@EronWright

Copy link
Copy Markdown
Author

@aheritier sorry for delay, all commits are now signed/verified.

Adds a parallel SamplingWithToolsHandler alongside the existing
SamplingHandler so MCP servers can include a tools array in
sampling/createMessage requests. The host drives its model with those
tools and returns any tool_use blocks as ToolUseContent; the server
remains responsible for executing the tool and continuing the loop in a
follow-up sampling request.

The initialize handshake now advertises sampling.tools capability, and
the MCP toolset selects the appropriate go-sdk handler (basic vs.
with-tools) based on which handler is registered.

Signed-off-by: Eron Wright <eronwright@gmail.com>
Mounts an in-process gomcp.NewServer on an httptest server via
StreamableHTTPHandler. Its one tool, ask_with_calculator, runs a
sampling loop: sends sampling/createMessage with a calculator tool,
gets a tool_use back from the host LLM, "executes" the calculator,
sends a follow-up sampling request carrying the tool_result, and
returns the final text. The Gemini side is recorded once and replayed
on subsequent runs, so the test runs offline in CI.

Signed-off-by: Eron Wright <eronwright@gmail.com>
- Reject tool_use blocks under a non-assistant role explicitly rather
  than silently dropping the message. The MCP spec places tool_use on
  assistant turns, but a malformed server would previously have its
  message disappear from the converted chat history with no error.
- Document the ordering dependency between SetSampling*Handler and
  Initialize in both stdio and remote MCP client setups, so future
  maintainers don't reorder them and silently lose the sampling
  handler on reconnect.

Signed-off-by: Eron Wright <eronwright@gmail.com>
Use errors.New for the static loop-exhausted message and strconv.FormatInt
for the calculator result instead of fmt.Errorf / fmt.Sprintf %d.

Signed-off-by: Eron Wright <eronwright@gmail.com>
@aheritier

Copy link
Copy Markdown
Contributor

/review

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Two medium-severity issues found in the new sampling-with-tools path. Both relate to edge cases in the streaming and reconnect logic.

Comment thread pkg/runtime/sampling.go
if choice.FinishReason != "" {
providerFinishReason = choice.FinishReason
}
if choice.FinishReason == chat.FinishReasonStop || choice.FinishReason == chat.FinishReasonLength {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Early break on FinishReasonStop may drop tool-call chunks from later SSE events

drainSamplingStreamWithTools breaks out of the stream-read loop as soon as it sees FinishReasonStop or FinishReasonLength:

if choice.FinishReason == chat.FinishReasonStop || choice.FinishReason == chat.FinishReasonLength {
    break
}

Tool-call deltas that appear in the same chunk as the finish-reason are captured correctly (they're processed in the for _, delta := range choice.Delta.ToolCalls loop a few lines earlier). However, if a provider splits its response so that tool-call argument chunks arrive in subsequent SSE frames after the finish-reason frame, those deltas are silently discarded.

The analogous handleStream in streaming.go guards against this with a continue that skips the early-exit when the current chunk contains only tool-call deltas (no finish reason yet). That pattern is missing here.

Impact: Incomplete or missing tool calls are returned to the MCP server as a ToolUseContent block with truncated arguments, causing the server's tool-execution loop to fail or produce wrong results. The failure is silent — no error is returned.

Suggested fix: Mirror the streaming.go guard: after accumulating tool-call deltas, continue to the next chunk if choice.FinishReason == "", and only break on FinishReasonStop/FinishReasonLength once all pending deltas have been flushed.

Comment thread pkg/tools/mcp/stdio.go
PromptListChangedHandler: promptChanged,
}
switch {
case c.samplingWithToolsHandler != nil:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Reconnect race: sampling handler may not be registered with the SDK if configureToolsetHandlers hasn't run yet

Initialize (called on both the initial connect and every supervisor-driven reconnect) reads c.samplingWithToolsHandler and c.samplingHandler at call time to decide which SDK-level callback to register:

switch {
case c.samplingWithToolsHandler != nil:
    opts.CreateMessageWithToolsHandler = c.handleSamplingWithToolsRequest
case c.samplingHandler != nil:
    opts.CreateMessageHandler = c.handleSamplingRequest
}
// if both are nil → no sampling callback registered with the SDK client

The SDK client is constructed with opts and never updated afterwards. If a reconnect happens before the runtime's configureToolsetHandlers has called SetSamplingWithToolsHandler / SetSamplingHandler, both fields are nil at Initialize time. The resulting SDK session has no CreateMessage* handler at all, so any sampling/createMessage request from the server is rejected with an error for the lifetime of that session.

This is a regression from the previous unconditional registration pattern, where the basic sampling callback was always wired. The comment "callers must invoke … before any reconnect" documents the contract, but it is not enforced by any ordering guarantee in the reconnect path today.

Impact: An MCP server that triggers sampling during or shortly after a reconnect (e.g., a long-running agentic loop that reconnects due to a network hiccup) will receive an error from the sampling handler instead of a model response, breaking its tool-execution loop.

Suggested mitigation: Either (a) guarantee in the supervisor/reconnect path that configureToolsetHandlers is always called before Initialize completes, and add a test that verifies this ordering; or (b) fall back to registering a best-effort handler at Initialize time that reads the handler fields lazily at request time (similar to how handleSamplingWithToolsRequest already reads c.samplingWithToolsHandler under the mutex), so a late SetSamplingWithToolsHandler call is still effective even after reconnect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent For work that has to do with the general agent loop/agentic features of the app area/mcp MCP protocol, MCP tool servers, integration area/testing Test infrastructure, CI/CD, test runners, evaluation area/tools For features/issues/fixes related to the usage of built-in and MCP tools kind/feat PR adds a new feature (maps to feat: commit prefix) status/blocked Author can't proceed and needs external help to get unblocked

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants